On searching misspelled collections

نویسندگان

  • Jason J. Soo
  • Ophir Frieder
چکیده

Over two thirds of misspelled queries are caused by transformation errors (insertion, deletion, replacement, and inversion; Li, Duan, & Zhai, 2012; Pollock & Zamora, 1984). Spelling-correction approaches must address these common transformation errors but many cannot without training data. For example, the USHMM has a document collection comprising 13 languages. The collection is too large for the low volume of queries to be used for training. Worse, should a supervised approach be deployed, the model might overfit to frequently queried languages, biasing against the results for minority languages. Although there are many kinds of spelling-correction algorithms, three prominent types are:

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Fast and Accurate Method for Approximate String Search

This paper proposes a new method for approximate string search, specifically candidate generation in spelling error correction, which is a task as follows. Given a misspelled word, the system finds words in a dictionary, which are most “similar” to the misspelled word. The paper proposes a probabilistic approach to the task, which is both accurate and efficient. The approach includes the use of...

متن کامل

The Effect of Specialized Multimedia Collections on Web Searching

Multimedia Web searching is a significant information activity for many people. Major Web search engines are critical resources in people’s efforts to locate relevant online multimedia information. It is therefore important that we understand how searchers are utilizing these Web information systems in their quest to retrieve multimedia information to design effective Web systems in support of ...

متن کامل

字形相似別字之自動校正方法 (Automatic Correction for Graphemic Chinese Misspelled Words) [In Chinese]

No matter that learning Chinese as a first or second language, a quite important issue, misspelled words, needs to be addressed. Many studies proposed that there was a suggestion of correcting misspelled words for students who are still schooling as well as a suggestion of teaching and learning strategies of Chinese characters for teachers. Although in schooling, it does to prevent students who...

متن کامل

Methods and Procedures of Sampling, Preservation and Identification for Fish Taxonomy Studies

Taxonomyhas two important roles: to name organisms and to classify them. Classifications are useful because they contain information about relationships.All species in the same genus should share many behavioral, biochemical, ecological and biological properties because they are closely related evolutionarily. The effect of pollution on a species at one location should be similar to the effect ...

متن کامل

Applying Inference Networks to Multiple Collection Searching

The paper describes how to use inference networks to solve two problems in searching multiple collections: collection selection and result merging. The eeectiveness of the approaches is demonstrated with the INQUERY system and 3 gigabyte TREC collections.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • JASIST

دوره 66  شماره 

صفحات  -

تاریخ انتشار 2015